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© Method off and device for the recognition, without previous training of connected words befonoino 1 
vocabularies. 

© The method consists in classifying the sounds forming 
the uttered words into eight phonetic classes plus a possible 
indication of the presence of diphtongs, starting from an 
acoustic-phonetic analysis of the sounds themselves. 

To recognize the uttered words the sequence of classes 
found out are a na Jy jPBtl by search-tree algorithms of pattern 
matching with sequences of classes corresponding to voca- 
bulary words, and possibly by dynamic programming algor- 
ithms* 

The detected classes are: silence, voiced fricatives 
unvoiced fricatives, plosives, affricates, nasals, semivowels, 
1^1 vowels. 

^ A device for implementing the method is also described. 



€0 
CO 



Si ./.. 



Croydon Priming Company Ltd. 








1 


> 


\ 1 1 



PATENTANWALTE 



DIpl.lng.Anlon Freiherr 

Riederer von Paar 

D-8300 Lanc^tM Q R jfi 
Postfach 2664, Freyurig <TfE> 
WLandshut (0871)22170 
Fax (CCITT 2) manuell 
Telex 5B441 gtala d 



Frhr. Riaderer v. Paar, Poatfaeh 2654. D-8300 Landahui 



CSELT 

Centro Studi e Laboratori 
Telecoraunicazioni S.p.A. 
Turin, Italy 



Partner In MQnchen: 
Dr. H. O. DIEHL 
« MQnchen (089) 177061 
Fax (089) 177461 (autom.) 
Telex 5215145 Zeus d 



Method of and Device for the Recognition, without Previous Training , 
of Connected Words Belonging to Small Vocabularies 



1 Peserlption 

The present invention relates to speech recognition systems, 
and more particularly it concerns a method of and device for recogni- 
zing, without previous training, connected words belonging to small 
5 vocabularies* 

Speech recognition can be faced either by means of a pattern 
matching approach, or by means of an acoustic-phonetic analysis. 

The pattern matching approach is based on a previous storage 
of speaker dependent templates characterising overall acoustical 
10 events to be recognized; and on the subsequent matching with the 
speech signal to be recognized* The main disadvantage of this approach 
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1 Is that templates storage requires an Initial training phase which may 
be either on—line (in case of speaker-dependent recognition) or off* 
line (in case of speaker-independent recognition) which is very time- 
consuming and obviously requires high memory occupancy. 

5 On the contrary, acoustic-phonetic recognition Is based on 

the detailed determination of the acoustic-phonetic features of the 
speech signal without requiring any previous storage of reference 
templates* 

This technique is generally used for large-vocabulary 
10 Isolated-word recognition as a preliminary analysis , to simplify the 
subsequent pattern matching phase; or is used in the continuous speech 
understanding domain as a preliminary analysis for classifying the 
sounds into fundamental phonetic classes » useful to the following step 
of recognition of individual phonemes belonging to these classes* 
IB An example of the letter application is described in the 

article by C. J.Weinstein et al. f "A System for Acoustic^Phonetic Ana- 
lysis of Continuous Speech" , XEEE Transactions on Acoustics, Speech 
and Signal Processing, vol. ASSP-23, No. 1, February 1975, where the 
sounds are preliminarily subdivided into four fundamental phonetic 
20 classes, and afterwards hypotheses are made on the individual phonemes 
belonging to, these classes* Phoneme hypothesis reliability is not very 
high because possible misinterpretations are recovered during the sub- 
sequent higher-level processing phases (lexical, syntatie, semantic 
interpretation) * 

25 The inventors have found that small -vocabulary , speaker- 

independent words recongltion does not require individual phoneme 
detection, but it requires only an accurate subdivision into phonetic 
classes starting from an acoustic-phonetic word analysis; hence this 
subdivision is the only step of the sound classification process. 

30 The present invention concerns a small-vocabulary word 

recognition method, which, on the basis of the acoustic-phonetic ana- 
lysis of the uttered sounds, subdi vises them into eight main classes, 
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plus an indication o€ the presence of diphtongs. The class sequence is 
analyzed by a tree-search algorithm of pattern-matching with sequences 
of classes corresponding to the words of the vocabulary, and possibly 
by dynamic programming algorithms • 

Such a method is described in claim 1. 

It is a further objet of the present invention a device for 
Implementing the method, described in claim 8* The method, object of 
che present Invention, is hereinafter described. 

The signal is subdivided into subsequent Intervals and each 
interval is classified into one of the following eight phonetic 
classes (hereinafter labelled by symbols written on the right of each 
of them): silence Q, voiced fricative Pv, unvoiced fricative Fn, plo- 
sive P, affricate A, nasal N, semivowel S, vowel V (with possible 
dlphtong detection). 

A recognition method of the words forming the speech signal 
is applied to the obtained class sequence. 

Provided the vocabulary of the words which can be recognized 
is conveniently chosen, the above subdivision into 8 classes is 'suf- 
ficient to recognize each possible sequence of said words in a 
20 speaker-independent mode • The subdivision into eight phonetic classes 
is carried out as hereinbelow described. 

First the speech signal is subdivided into equal time inters 
vals and digitised, obtaining, at each Interval, N digital sampjsg 
s n (l«n<H). 

25 A linear prediction coding (LPC) is applied to the digital 

samples s n of each interval. In other words, at each interval, linear 

prediction coefficients ai,...^, a P of the following function 

are determined: 



15 



H(z) 



30 • *r .1 CI) 

1 + 2, *± • z 

i-1 

where z_ indicates digital-sample ^-transform, H( 2 ) the transfer 
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1 function defining the vocal trace model, at each Interval by an all- 
pole* digital filter; P(l<i<P) the digital filter order. 

The values pj. of the following normalized autocorrelation 
function are thus determined: 

S-l-i 

- I «n ' « n +i 

* „. . n-0 (2) 
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X • a 



•» H-l 

. -5. 

Values Pi are then used Co resolve the following linear sy- 
stem of i, equations: 
10 p 

I. «k ' Pi-k - -PL (3) 
k«l 

The linear-prediction coding technique is known and described 
in the hook by L.R.Rahlner, R.W.Schaferx Digital Processing of Speech 
Signals", pages 396 and ff., Englewood Cliffs, Prentice-Hail. 1978* 
j£ Normalised residual energy ER is calculated at each interval 

using values &± p Pi as follows: 

P 

ER m 1 - I PL * «i (*> 
i-1 



Further the values of the formant frequencies F r , i.e. the 
resonance frequencies of transfer function H(z) defined by relation 
(1) are calculated. Values F r are given by the peaks of function H(z) 
calculated point-by-point and by applying known parabolic- 
interpolation techniques* 

To ensure a sufficient continuity between the formant s calcu- 
lated at adjacent Intervals, known formant tracking techniques are 
used, which replace formant values which greatly differ from those of 
adjacent intervals by Interpolated values. These formant computing 
techniques are described, e.g. in the book by J.D.Markel, A. H.Gray Jr. 
30 "Linear Prediction of Speech", Berlin, Springe r-Verlag, 1976, page 165 
and f o llowing • 
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1 Together with the preceding operations, always using linear 

prediction techniques applied to low-pass-filtered digital samples s n , 
the speech signal of each interval is classified as voiced or unvoi- 
ced, by applying the algorithm known in the technique by the acronym 

5 SIFT (Simplified Inverse Filter Tracking) which consists of the follo- 
wing steps: 

- digital samples are low—pass filtered and then sub-sampled; 

equations (2), (3), are applied again to the sub-sampled signal, 
thus obtaining new p£, a^ values; 
10 - the sub-sampled signal is further digitally filtered according to 
the inverse transfer function referred to as inverse filter of re- 

r 

latlon (1), and using the just-computed coefficients at, thus ob- 
tianing the residual signal r x> with x identifying the sample of 
sub— sampled digital signal (1 <xd), 
15 - the autocorrelation function of residual signal r x Is 
computed: 

M-l-x 

** " X *x • *x-k (5) * 

K-l 
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Rx peaks are looked for and compared with a threshold; 
a signal SF is generated indicating "threshold exceeded" if at 
least one of said that peaks of exceeds threshold; "threshold 
exceeded" refers to a voiced sound, the opposite to an unvoiced* 

The algorithm is of known type, as described e«g. in the 
above-cited book by J.D.Markel, A.H.Gray, page 197 and following. 

Together with the preceding operations , fast Fourier tram~ 
.form is computed of the digital signals s^* of each interval, to de- 
termine the following energy values referred to an interval: 

- total energy Ex extended to overall frequency band of the orig&?&a3, 
signal; 

3Q - energy Eh of the intermediate band; 

- energy Eh of high frequencies; 
energy Ei, of low frequencies* 
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Then each interval is classified as silence Q or voice by ap- 
plying Che following algorithm: 

- a reference energy value is calculated 

E RXF m a ( E T~^Tl)» where 18 the total energy , Ejj 18 the ini- 
tial mean total energy extended to the first five intervals consi- 
dered, which hence takes into account background noise, and a is a 
constant factor; 

- residual energy Eg, Is compared with Egj?; if Er>Erif the interval 
is classified as silence <?• 

Ratio K*El/Eh between the energy at low and high frequencies 
is calculated and then compared with a threshold* Then a subdivision 
is effected into four levels indicating the voicing degree of the 
speech signal at each Interval, starting from the analysis of signal 
SF, and ratio R, according to the following table: 



R 



Threshold exceeded 



SF 



Speech-signal 
voicing degree 



signal 



Voiced 
Quasi 

Quasi unvoiced signal 
Unvoiced 



Threshold exceeded 
not - 

not M " exceeded 

w not - 

Starting from the energy at high frequencies of the preceding 
interval EnCj-l) and of the subsequent interval Eh(J+1)> the value 
SSF(j) is computed of the spectral stability function defined by the 
following formula: 

| 10 log E^<j+l) - 10 log E^<;J-1) 
SSF(j) - — 



(6) 



e +| 10 log E^(J+1) + 10 log E^(J-l) | 

where j is the interval Index; e and j5 are two constants. 

Value SSF(j) is compared with a threshold and the Intervals 
at which the threshold is exceeded are considered as the start points 
of sounds such as Fv, Fn, P, A* 

In a limited neighborhood j-N, j+N of each of said intervals 
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1 taken as references and denoted by index j, a check procedure is 
effected to establish which of the following conditions has taken pla- 
ce: 

- most of the previous intervals (j-l,*.j-N) have been classified as 
5 a) silence Q 

b) voiced or quasi voiced (signal VNV) 

- most of the subsequent intervals (J+l, . - . . j+N) have been classi- 
fied as: 

c) unvoiced or quasi unvoiced 
*0 d) voiced or quasi voiced 

e) quasi voiced or quasi unvoiced 

- in most of the subsequent intervals total energy Ex has a value 
which, if compared with a threshold Ei, 1st 

f ) Ex<Ei 
15 8) E T >Ei 

- in interval j total energy E^ has* a value which, if compared with 
a threshold E2>Ei, is: 

h) Ex<E2 

i) Ex>E2 

20 Upon the check of these conditions, the following possible 

indications of phonetic classes are emitted for interval j and the 
following: 

- an indication of voiced fricative class Fv if the following Condi** 
tlons are met: 

25 *)» d), h); or b), d), or a), e), h); or b), e) 

- an indication of unvoiced fricative class Fn for the following 
conditions: 

a), c), f); or b), c) 

- an indication of plosive class P for conditions: 
30 a >» d >> *>> or a >» e >» i) 

an indication of affricate class A for conditions: 
a), c), g) 
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1 Once identified, one of said indications is maintained for a 

number of intervals, starting from Interval j, determined as follows: 

- the Indication of class Fv is maintained up to the Interval in 
which condition i) takes place again; 

5 ~ the indication of class Pn is maintained up the interval in which 
condition d) occurs again; 

- the indication of class P lasts for an only interval if it has 
been detected with conditions a), d), i); otherwise it is maintai- 
ned for all the Intervals in which conditions e) is present; 

10 - the Indication of class A is maintained up to the interval in whi- • 
ch condition d) occurs again - 

For sequences of interval not classified as Q> Fv, Fn, P, A, 
a search is effected for possible minimum energy values in the inter- 
mediate band Em: an algorithm known in the art as DIP search algo— 
15 rithm M is used as disclosed for instance in the above mentioned paper 
by C.J.Weinstein et al. 

According to this algorithm a linear interpolation is perfor- 
med among contiguous values E>f(j) by a smoothing function to 
smooth the instantaneous peaks in the values of Em» **hich peaks 
20 are not significant to the search for the above minima, obtai- 
ning, for said sequences of intervals » a smoothed mean energy func- 
tion E» M (j). 

Then the trend in time of the values E f M (j) is considered: 
maxima and minima are searched for, and the ratios between a minimum 

25 and each of the two adjacent maxima are calculated; if even only one 
of the two ratios is greater than a fixed threshold, then in the 
neighborhood of the interval corresponding to the minimum of Eft, a na- 
sal sound N or a semivowel sound S identified. 

To decide whether the sound is N or S, the duration is consi— 

30 dered of the time interval in which the differences between the energy 
values and the minimum, are within a certain range . If said duration 
exceeds a certain threshold, in the pertaining Intervals the sound is 
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1 classified as N, otherwise it is classified as S. The other intervals 
of these sequences which are classified neither N nor S are classified 
as V (vowel). 

If the number of consecutive intervals classified as V is 
5 greater than a threshold, a search procedure Is activated for possible 
consecutive vowels (diphtongs). 

To this aim the trends are considered, during said sequences 
of intervals, of the interpolated values of the lowest .formant fre- 
quencies Fr- 
10 If said trends present constant regions whose mean levels • 
differ from one another by values greater than a threshold, each of 
said regions detects a vowel. The values of said levels are used then 
also during the word recognition- step. 

The method used for word recognition basically employs known 
15 algorithms, such as tree search, pattern matching and dynamic program- 
ming algorithms, as described e.g. in the paper by J.S.Bridie, 
R.M. Chamber lain, M.D.Brown: "An algorithm for connected word 
recognition", International Conference on Acoustics, Speech and Signal 
Processing, pp. 899-902, Paris, May 1982. 
20 Class sequences comprised between two sufficiently long si- 

lence periods are investigated by these procedures. 

Some indications of classes lasting for too small a number of 
consecutive intervals are eliminated in each of said class sequences « 
In fact, statistically, the Indications of classes Fv, Fn, V, A, whose 
25 duration Is too short correspond to classification errors. 

Then within said sequences, equal consecutive classes are 
united under a single indication, with the exception of diphtongs or 
different consecutive vowels, for which as many consecutive Indica- 
tions V are maintained as are the vowels, thus obtaining reduced se- 
30 quences of classes. 

For example the reduced sequence QFnV 1 v 2 Q is derived from the 
possible following class sequence 
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1 QQQQ FnFnFnFnFn V 1 V 1 V 1 V 1 V 1 V 2 V2V 2 V 2 V2 QQQQQ, 



Each reduced class -sequence S(L) ft where L is the number of 
symbols, is analyzed by the pattern matching search algorithm which 
compares it with sequences corresponding to words of the vocabulary, 
5 till similarity is found with at least one of them: under the hyphote- 
s is made of a limited vocabulary the sequence found out is the only 
one possible. If on the contrary no matching is found out, the enti- 
re reduced sequence is analyzed by applying dynamic programming algo- 
rithm which searches for an acceptable class sequence having the mi- 
10 nimum distance from that under test. If this distance is less than a * 
fixed threshold, the sequence is recognized as valid, otherwise it is 
not* 

An example of limited ^vocabulary may be the sequence of di- 
gits (0, 1, 9). 
15 . In the Italian language, for digit representation, the follo- 

wing graphemes pertaining to the following classes are to be used: 



Class 


Graphemes 


Fv 


z,v 


Fn 


s 


P 


d,t,q 


A 


c 


N 


n 


S 


r 


V 


a,e,i,o,u 



25 In addition the following correspondence is obtained between 

digits and reduced class sequences: 
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DiEit 


Reduced class-sequence 


ZERO 




FvVSV 


UNO 


(one ) 


VNV 


DUE 


(two ) 


PVV 


TRE 


(three) 


PSV 


QUATTRO 


(four) 


PWQPSV 


CINQUE 


(five) 


AVNQPVV 


SEI 


(six) 


FnW 


SETTE 


(seven) 


FnVPV 


OTTO 


(eight) 


VPV 


NOVE 


(nine) 


NVFvV 



10 



If the classes are correctly identified any digit sequence 
can be recognized. 

The oniy ambiguity might arise in the identification of digit 
15 sequence "du^, tre" or digit "quattro": in fact in both cases there is 
the reduced class-sequence FVVQFSV- In this case, however, it is suf- 
ficient to check the distance between the values of fonnant frequen- 
cies Fr of the two underlined vowels: the distance must be below a fi- 
xed threshold in one case (due, tre^), because the same vowel is pre- 
20 sent, while above the threshold in the other (quattro) because the vo- 
wels are different. 

A device for implementing the described method is given he— 
reinbelow as a non-limiting example with -reference to the annexed dra- 
wings in which: 

25 - Fig- 1 is a general block diagram of the device according to the 
invention; 

Fig. 2 is a circuit diagram of block SIL of Fig. 1; 
Fig. 3 is a circuit diagram of block CLSS of Fig. 1; 
Fig. 4 is a circuit diagram. of . block FPA of Fig. 3; 
30 Fig. 5 is a circuit diagram of block DUR of Fig. 3. 

In Fig. 1 AD denotes a block which converts into digital the 
analog speech signal it receives on wire 1 and then subdivides it into 
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1 time intervals so that each interval contains an integer number N of 
digital samples s n of speech signal: the samples of each interval are 
supplied on connection 2, while on wire 3 a reference signal at inter- 
val frequency is supplied. 
5 LPC denotes a block which calculates values ai, pi, of formu- 

lae (2), (3) and supplies them on connection 4, 

RE indicates a block which, by using values a±, p± calculates 
at each interval normalized residual energy Er given by formula (4) 
and supplies it on connection 5* 
10 FRM denotes a block which, at each interval, determines the • 

values of formant frequencies Fr calculating point-by-point function 
hy using values a^ it receives from connection 4* Values Fr are 
supplied on connection 6. 

SIFT denotes a block apt to implement the homonymous algo- 
15 rlthm of classification of the speech signal as voiced or unvoiced: 
SIFT receives the digital samples present on connection 2 and supplies 
on wire 7 signal SF indicative of "threshold exceeded". 

FFT denotes a block calculating the Fast Fourier' Transform of 
the digital samples of each interval it receives from connection 2. 
20 The transformed values are supplied through connection 8 to block GEN 
. which calculates at each interval energy values Ej, E^, Eg, sup- 
plied on connections 9, 10, 11, 12 respectively. 

SSF denotes a block which, at each interval, calculates the 
value of spectral stability function (6), starting from the energy va- 
25 lues Eh it receives through connection 11. SSF emits on wire 13 a si- 
. gnal which, at each interval, indicates whether the computed value is 
below or above a fixed threshold. 

SMOT denotes a block which receives the energy values of in- 
termediate band E M present on connection 12 and performs the -linear 
30 interpolation between adjacent values, i.e. the first part of the 
above-described "DIP" algorithm, and obtains the values of smoothed 
mean energy function E' M (j) then supplied on connection 14: each value 
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1 E'h is also accompanied by the possible indication of maximum or mini- 
mum point, obtained by comparing said value with the preceding and 
subsequent one. 

The implementation of blocks LPC, RE, FRM, SIFT, FFT, CEN, 
5 SSF, SMOT is not a problem to the skilled in the art once the func- 
tions performed are known, which functions have been examined while 
describing the method. E.g. these blocks can be implemented by known 
microprogrammed structures, provided "their computing rate is compati- 
ble with real-time processing requirements* 
10 COMP denotes a block comprising two usual majority compara- 

tors comparing total energy values E*j present on connection 9 with the 
two thershold levels Ei and E2 respectively ♦ 

COMP emits over wires 15 and 15 v signals indicative of the 
result of said comparisons, according to the following correspondence 
15 between logic levels and Ex values r 

Logic level on wire 15 : "0" + E x < El 

- - 15 : "1" ♦ E X > El 
« 15«: "0- * E T < E 2 

- " 15' : "1" + Ex > E 2 

20 VNV denotes a block comprising a divider apt to compute ratio 

R between energy values El and E H it receives through connections 10, 
11 and a threshold comparator for said ratio* VNV, in addition, com- 
prises a normal combinatory logic generating a signal with four logic 
levels indicating the voicing degree of the speech signal at each in- 

2q terval, by combining the logic levels of the output of the internal 

■•■•'. : t 

comparator and of wire 7 so as to obtain the following correspondences 
between logic levels on connection 16 and voicing degrees: 







Connection 16 


Voicing degree 






11 


voiced signal 


30 




10 


quasi voiced 






01 


quasi unvoiced " 






00 


unvoiced " 
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1 SIL denotes a block apt to classify as silence Q or voice 

each interval* SIL calculates values Erif starting from the values of 
total energy Ex which it receives from connection 9, and compares them 
with the values of residual energy Er present on connection 5. The 
5 comparison results are supplied on wire 17 f according to the following 
convention: logic level "1" means silence, while "0** means voice. 

An example of embodiment of block SIL will be described with 
reference to Fig* 2* 

RG1 denotes a common buffer register which receives the data 
10 present on connections 6, 13, 14, 15, 16, 17 and combines them so as 
to create data words, eacfi made up of data pertaining to a determined 
Interval, which data will be present at the same time at RG1 inputs 
thanks to the insertion of suitable delay circuits into upstream 
blocks • 

15 RG1 is synchronized by the interval-frequency signal it re- 

ceives on wire 3, and supplies on output bus 18 the words composed in 
this way* 

HEM1 denotes a random access memory which at each Interval 
contains the last Z data words received from RG1. MEM1 perforins in 
20 writing as a shift register for the words it receives from RG1 at the 
instants at which interval-frequency signal fs active on wire 3; this 
signal acts as clock signal and as read/write signal* During reading, 
however, the access to MEM1 is random; the data read are supplied on 
bus 19 to block CLSS which generates the corresponding reading addres- 
25 ses on bus 20. 

Circuit blocks examined till now operate in a synchronous mo- 
de .with the data present on connection 2, and with a constant delay* 

CLSS performs as a sound classifier according to the eight 
above-mentioned classes* The classification is performed starting from 
30 the analysis of the data words present in MEM1* The structure of block 
CLSS will be examined in detail with reference to Fig. 3. 

ELB denotes a block designed to recongize word sequences* ELB 



- 15 - 0173986 

1 comprises a memory of reduced sequences of classes, corresponding to 
vocabulary word sequences, a memory for values Fr, and a memory for 
the class sequences it receives from block CLSS on bus 21 . 

ELB comprises means for carrying out the tree-searches of 
5 pattern matching with sequences of memorized reduced classes, and 
means for performing the searches for acceptable reduced class sequen- 
ces according to dynamic programming techniques. 

The embodiment of block ELB Is not a problem for the skilled 
in the art person once known the carried out functions investigated 
10 during the method description. ELB can be implemented with a knwon mi- 
croprogrammable structure, provided its computing rate is compatible 
with real-time processing requirements . 

Blocks CLSS and ELB work in an asynchronous mode* 
In Fig. 2 COT1 denotes a counter, synchronized by the 
15 interval-frequency signal on wire 3, which supplies an enabling signal 
on wire 25 till maximum counting values is reached. 

SMI denotes an adder which, during the intervals in which is 
enabled by the signal on wire 25, i.e. during the intervals of the 
operation beginning of the device of Fig. 1, adds value Ex present on 
20 connection 9 to the content of register RG2 available, on connection 
26. Said content is the result of the addition performed by SMI at the 
preceding interval. When the enabling on wire 25 is over, at RG2 out~ 
put initial total energy value Exi will be present. 

At each interval said value is subtracted in a subtractor SM2 
25 ^om the value of total energy Ex present on connection 9. 

The subtraction result is supplied on connection 27 to multi- 
plier ML1 which multiplies it by the constant value a available at the 
output of memory element MM. 

The multiplication result is the value of reference energy 
30 E RIF which is supplied through connection 28 to an input of majority 
comparator CMP1 which compares it with the value of residual energy 
E R it receives at the second input from block RE (Fig.l) through 



01X3986 

- 16 - 

1 connection 5. 

On output wire 17 of CMP1 a logic M l~ is present if E r >Erip 
(condition corresponding to the classification of the Interval as si- 
lence Q), otherwise a logic "0" will be present. 
5 All the blocks of Fig. 2 are synchronized by interval- 

frequency signal present on wire 3. 

In Fig. 3 reference RQ denotes a register reading from bus 
19, carrying the data coming from memory MEMl (Fig. 1), the field of 
each data word carrying the indication silence/voice. The indication 
10 of silence sets output Q, while the indication of voice sets output 
. T • 

RSF denotes a register reading from bus 19 the field of data 
words carrying information on whether spectral-stability threshold has 
been exceeded. When signal Q is active, RSF outputs are activated: mo- 
15 re particularly, the "threshold exceeded" indication sets output SF, 
m while the indication "threshold not exceeded** sets output SF. Regi- 
sters RQ, RSF are synchronized by clock signal CK» 

IND1 denotes a first addressing unit for memory MEMl, allo- 
wing the reading of silence/voice field of the addressed words, which 
20 field is then memorized in RQ. 

1ND1 comprises an up /down programmable counter, which is 
synchronized by clock signal CK and which usually counts up; on the 
contrary when it receives a pulse on wire 3 it decrements the counting 
by one unit. In addition said counter is stopped when signal (f is ac- 
25 tive, and is programmed at the address value present on bus 20 when 
the output signal of OR gate P5 is active. IND1 emits as adresses on 
bus 20 the counting values, while at each counting increment it emits 
a pulse on wire 30. 

IND2 denotes a second addressing unit for memory MEMl, which 
30 allows the reading of data word fields relating to identification of 
sounds Fv, Fn, P, A, which fields are supplied to blocks FPA and DUR 
through the respective wires .of data bus 19. 
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1 IND2 comprises an up/down programmable counter, synchronized 

by clock signal CK, which emits the values counted on bus 20 as ad- 
dresses for MEM1. Said counter begins counting up when output SF of 
RSF is set, provided the counter is not inhibited by a signal coming 
5 from block IND3 through bidirectional connection 31 • On the contrary, 
when it receives a pulse on wire 2, decrements the counting by a unit. 

At each activation the counter executes two consecutive coun- 
tings: the first is an up counting by 2N+1 units and starts from the 
value present on bus 20 at the activation instant decremented by N; 
10 the second counting starts from the value present on bus 20 at the ac- 
tivation instant and is incremented till an end-of-counting signal ar- 
rives on wire 32. 

IND2 supplies block IND3 with an inhibition signal, through 
connection 31, -during its operation. Moreover, it supplies on connec- 
15 tion 33 the values counted to block FPA and to block DUR during the 
first and the second of the two consecutive countings, respectively; 
said values perform as synchronism signals for the operations of 
blocks FPA, DUR. . 

1ND3 denotes a third .addressing unit for memory MEMl, alio- 
20 wing the reading of the fields of the data words relevant to the iden- 
tification of sounds V, N, S, which fields are supplied to block VNS 
through the corresponding wires of data bus 19. 

1ND3 comprises a programmable up/down counter^ which is sync- 
hronized by clock signal CK and emits the values counted on bus 20 as 
25 addresses for MEMl. Said counter starts counting up when output SF of 
RSF is set, provided the counter is not inhibited by the inhibition 
signal supplied by IND2 on connection 31. As long as IMD3 operates, it 
emits on the same connection 31 the inhibition signal for IND2 and on 
connection 34 the values counted, acting as activations for the opera- 
30 tions of block VNS. . 

IND3, receives control signals for up or down counting or 
pause through connection 34 from block VNS, from which it also recei- 
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ves via wire 35 a signal of end-of-operation determining stopping of 
the counter. Also the counter of HTO3 decrements the counting by a 
unit when It receives a pulse on wire 3. 

VNS denotes a block carrying out the functions of Identifi- 
cation of sounds V, H, S. 

It receives through bus 19 the bits of the following fields 
of data words: silence Q, spectral stability threshold, values of for- 
mant frequencies Fr, values of smoothed mean-energy function E f n and 
relevant Indications of ii«x* f ffli f n and minimum points* 

VNS Is activated by the signal received from block IND3 
through connection 34, whereon it supplies also the control signals 
for the counter of XND3, and is synchronised by clock signal CK. VNS 
supplies on the outputs S, N, V, D the indications of semivowel, 
nasal, vowel, diphtong, respectively, and for each Interval identified 
by one of said classes emits a pulse on wire 36* In addition it emits 
the end-of-operation signal on wire 35 which signal is carried to 
block IND3 and to an input of gate P5j said signal is generated in 
correspondence with the Interval (and hence of the data word) In which 
signals Q or SF become active again* 

VNS Implementation is not a problem for the skilled in the 
art person, once its functions, discussed above as well as during the 
method description, are known* 

VMS may be, e.g. implemented with a known type microprogram- 
med structure, provided its computing rate is compatible with real- 
time processing requirements. 

FPA denotes a block checking the occurrence of conditions a, 
• *.i, described in the method, for detecting sounds Fv, Pn, P, A. To 
this aim It receives the fields of data words present on bus 19, . rele- 
vant to silence, comparisons with energy thresholds Ei, E2, and 
voicing degrees; further it receives the values counted by IND2 via 
connection 33; FPA emits the indications of occurrence of conditions 
>1 on the homonymous wires on connection 37. 
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An example of embodiment of FPA will be described with 
reference to Fig. 4* 

LGC denotes a combinatory logic emitting on connection 38 
signals Indicating voiced fricative (Fvl), unvoiced fricative (Fnl), 
plosive 1*2 )» affricate (A^) class, combining the indications of 
occurrence of conditions a a ♦ it receives on connection 37 as 
indicated in the following truth table corresponding to the method 
described above of emission of phonetic-class indications: 



37 


38 


a 


b 


c 


d 


e 


f 


g 


h 


1 






Pi 




A l 


1 


0 


0 


1 


0 


0 


0 


1 


0 




0 


0 


0 


0 


1 


0 


0 


0 


1 


0 


0 


1 


0 




0 


0 


0 


0 


0 


1 


0 


1 


0 












0 


0 


0 


0 


0 


1 


0 


0 


1 












0 


0 


0 


0 


1 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


0 


0 


1 


1 


o' 


0 










0 


1 


0 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


1 


0 


0 


0 


1 


0 


0 


1 


0 


0 


0 


0 


1 


0 


1 


0 


1 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 



In the table, "1" indicates "condition occurred", "O" 
"condition not occurred", "-" "don't care". The plosive class indica- 
tion is carried by two signals » Pi relating to the type of plosive 
so and lasting one only intervals P2 relating to the occurrence of 
condition e). By using the truth-table above, every skilled in the art 
person can implement block LGC. 

The signals indicating classes coming from LGC, before being 
carried to block ELB (Fig. 1), are supplied to block BUR which deter* 
mines the number of consecutive intervals identified by the class 
received by LGC through connection 38. BUR receives from bus 19 the 
same data as block FPA and from connection 33 the values counted by 
INL2. BUR, when enabled by INB2, emits the class indications on the 
outputs denoted by the same class symbol Fv, Fn, F, A on the basis of 
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1 the corresponding class signals received from LGC; besides, it emits 
on wire 39 a pulse for each Interval Identified with that class. At 
the end of -the operations, DUE emits a pulse on wire 32, which Is con- 
nected to blocking input of XND2 and to a P5 input to restart DJD1 . 

5 An example of embodiment of block DDR will be described in 

connection with Fig. 5. 

The pulses present on wires 30, 36, 39 are combined by logic 
gate F6 so as to supply on wire 40 a pulse for each interval iden- 
tified with any class. 
10 BT denotes a conventional time base generating clock signal 

CK for all circuits of CLSS. BT is blocked during all the periods in 
which the signal on wire 3 is active, i.e. during the data writing 
phases in MEMl (Fig. 1). Besides » at the beginning of the procedures 
BT remains blocked for a determined number of pulses present on wire 

15 3 « 

The indications of classes of the outputs of HQ, VNS, DUR and 
the signal on wire 40 are supplied on bus 21 which is connected to 
block ELB (Fig* 1). 

In Fig. 4 GA, GB, , d denote nine conventional counters 

20 checking conditions a, b, ...i, respectively* 

Said counters, when enabled, count the pulses received from 
block DIN. 

Bnablings for the counters are obtained by the following com- 
binations of the data present on bus 19: 
25 - for counter GA, level "I" of voice/silence field on wire 171; 

- for counters CB, CD level "1" of the higher-weight bit of the 
field of voicing degree over wire 162; for counter CC, instead, 
level "0" of said bit; 

- for counter GE, logic EX-0R, executed in gate F4, of the two bits 
3Q of the voicing-degree field present on wires 161, 162; 

- for counters CF, CG logic levels "0- and "1" of the field of com** 
parison of energy Ex with threshold E^, present on wire 151; 



10 



01T3986 

- 21 - 

- for counters CH, CI logic levels "0" and "1" of the field of com- 
parison of energy Br threshold E 2 , present on wire 151*. 

DIN separates the counting pulses received through connection 
33; over wire 41 it supplies the pulses from (j-N) to (j-1), where j 
is the value present on bus 20 (Fig. 3) at the beginning of XND2 conn- 
ting; over wire 43 pulse J; over wire 42 pulses from (J+l) to (+«)• 

Upon reception of pulse (j+N) DIN supplies on wire 44 an 
enabling signal which performs also as a reset signal for all the 
counters • 

When enabled, counters CA, CB count the pulses on wire 4i; 
counters CC, CD, CE, CP, CG those on wire 42; counters CH, CI those on 
wire 43. 

Since on wire 43 a single pulse takes place, counters CH, CI, 
when enabled, supply it to the output on wires 52, 53 respectively. 

Counters CA, CG instead supply a logic "1" on the out- 

put, on wires 45 , 46 , 47 , 48, 49 , 50, 51 respectively, if they reach 
counting value N/2+1, i.e., half plus 1 received pulses. 

Signals on wires 43 53 are applied to the inputs of 

register RG3 which supplies them on bus 37 when it receives the 

20 enabling signal on wire 44. 

In Fig. 5, BG4, RG5, RG6, RG7 denote four registers which, 
when enabled by the pulses on wire 33 (pulses relevant to the second 
of the consecutive countings executed by block HJD2 of Fig. a>, supply 
to the outputs the signal applied to the inputs, connected *0 
25 60 , 61, 62, 63, respectively. Register outputs carry sound-class indi- 
cations, and pulses on wire 33 determine the number of intervals 
during which said indications are valid. 

Register RG4 supplies the indication of class Fv if and as 
long as the signal on wire 60 is active, which signal is supplied &y 
30 the output of gate Pll executing the logic AND of signal F v i, coming 
through bus 38 from logic LGC (Fig. 3), and of signal on wire 151 9 
belonging to bus 19. 
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1 Register H65 supplies the indication of class F n if and as 

long as the signal on wire 61 is active, which signal is supplied by 
the output of gate F12 which executes the logic AND of signal F a i, 
coning from bus 38, and of the complement value of the signal on wire 

5 162 coming from bus 19. 

Register R66 supplies the indication of class A, if and as 
long as the signal on wire 62 is active, which signal is supplied by 
the output of gate P13, which executes the logic AND of signal A]., co- 
ming from bus 38, and of the complement value of signal on wire 162. 
10 Register RE37 supplies on wire 65 one of the two possible 

indications of plosive sound if and as long as the signal on wire 63 
is active, which signal is supplied by the output of gate PI A vhich 
executes the logic AND of signal F2, coming from bus 38, and of the 

m 

output of gate F10, which executes the logic EX-OR of signals on wires 
15 161, 162 coming from bus 19* 

Signal Pi coming from bits 38, and the signal on wire 63 are 
supplied to OR gate F15 which emits the indication of class P. 

The signals on wires 60, •••«, 63, and signal Pi are applied 
to the Inputs of register R68, synchronized by the pulses on wire 33 • 
20 RG8 emit 8 a signal on wire 32, active when input signals are no longer 
active* 

The signal on wire 32, when active, stops the counter of 
IND2, emitting the pulses on wire 33* 

The signals of class Fv, Fn, A, F, are also carried to the 
25 inputs of register R69, which emits on wire 39 the pulses present on 
wire 33 when one of the indications of such classes Is active* 

The operation of circuit CLSS of Fig. 3 will be now 
described* 

At the beginning of the procedures time base BT generates 
30 signal CK with a delay of a certain number of intervals so as to allow 
an initial partial filling of data-word memory HEH1 (Fig* 1). 

Then, the counter of IND1 begins addressing MEMl: as long as 
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1 the? silence /voice field of data words indicates- silence, register RQ 
supplies indication Q on bus 21 , to which IND1 counting pulses are 
also supplied through wire 30, gate P6 and wire 40, which pulses 
determine the number of intervals characterized by class Q. All the 
5 other circuits of CLSS remain deenergized* 

When, on the contrary, the silence/voice field indicates 
voice, output q" of RQ activates register RSF and stops unit XND1: if 
spectral-stability of data word present on bus 19 indicates spectral 
stability threshold exceeded, the output SF will be active and will 

10 activate addressing unit IND2; otherwise output SF will be active and 
will activate addressing unit IND3. 

If unit IND2 is activated, then the search for sounds Fv, Fn 
P f > begins. MD2 inhibits IMD3 through the signal on connection 31, 
up to the end of the search. IND2 counter addresses data words in MEM1 

15 from position (j-N) to Cfffl) where J is the address present on bus 20 
upon ZRD2 activation; data words are supplied to circuits FPA and LGC 
activated by the first counting sequence supplied on connection 33. At 
the end of said first sequence, a combination of conditions a, ...i, 
decoded by LGC into one of the possible class indications supplied on 

20 bus 38, is present on output bus 37 of FPA. 

Then circuit DDR is energized by the second counting sequence 
supplied by IND2 on connection 33. As long as DDR is active, it emits 
one of classes Fv, Fn, P, A on bus 21, on the basis of the analysis of 
the data words present on bus 19, and emits in addition on wire 39 the 

25 pulses of the second counting of IND2, which pulses determine the 
number of intervals identified by. the relevant class, and are supplied 
on bus 21, through gate F6 and wire 40. 

When DDR detects the end of the class found out, it stops 
IND2 counting by a signal on wire 32, which determines also reac- 

30 tlvation of IND1 which begins again addressing MEMl (Fig. 1) from the 
value present on bus 20 at that instant. Operation control is then 
taken again by register RQ as disclosed above. When output Q is 
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active, HSF Is activated again, and 1ND1 is stopped- When output SP is 
active, addressing unit IND3 is activated. JSD3 inhibits IND2 through 
connection 31 till the operation end, activates block VNS through the 
signal on connection 34, and addresses MEM1 starting from the address 
present at the activation on bus 20. The addressed data words are 
supplied to VHS which, on the basis of their analysis .emits the indi- 
cations of classes V. N, S, and of dipthong D on bus 21; VNS emits 
also the pause or down/up control signals for HJD3 counter on connec- 
tion 34 and the pulses identifying the intervals classified V, N, S 
on wire 36 connected to bus 21 through gate P6 and wire 40. 

When VNS detects the presence of class Q or spectral stabi- 
lity threshold exceeded, it stops IM>3 and reactivates UTOl through 
the signal on wire 35. Control is then taken again by unit TJTOl, as 

already described. 

It is worth noting that active logic level on wire 3 (which 
condition occurs at each writing In MEM1 of a new data word) determi- 
nes the temporary stopping of time base BI and consequently of all 
synchronized circuits of CLSS; besides in the addressing units it 
causes the decrement by a unit In the counter active at that instant 
to take into account the shift by a position of data words in MBM1, 

caused by the new writing. 

Variations and modifications could be made to the example of 
embodiment described while remaining within the scope of the inven- 
tion. 
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1 Claim 

lm Method of recognition of connected words belonging to small voca- 
bularies providing for an initial step wherein a speech signal is 
digitised and is subdivided into Intervals, in each Interval an 
5 acoustic-phonetic analysis being made comprising % a linear- 

prediction encoding to calculate a normalized residual energy Er, 
formant frequencies Fr and an autocorrelation function of residual 
signal R^, wherefrom a first indication of voiced or unvoiced 
signal is extracted, depending on whether R x is greater or less 
than a threshold; a Fast Fourier Transform to calculate a total 
energy Ex, a low-frequency energy E L , an intermediate frequency 
energy Em, and a high-frequency energy E^. wheref rom a value of 
spectral stability function SSF is calculated; providing also for 
an Intermediate phase comprising the analysis of the trend of said 
15 intermediate frequency energy E M and of said formant frequencies 

Fr, so as to assign to interval sequences a division into nasal, 
semivowel, vowel classes with an indication of presence of 
dlpthongs; providing also an end phase during which said word 
recognition is performed by analyzing a sequence of phonetic indl- 
20 cations obtained during said intermediate phase by tree search 

algorithms of pattern matching of phonetic indications sequences, 
and dynamic programming, characterized in that said sequence of 
phonetic indications consists only of a subdivision of the total 
sequence of intervals Into phonetic classes comprising said nasal, 
25 semivowel, vowel classes with diphtong presence, and silence, 

voiced fricative, unvoiced fricative, plosive, affricate classes; 
said silence class being detected during intervals in which said 
normalized residual energy E R exceeds a value of reference energy 
Brif calculated on the basis of said total energy Ej; the 
30 beginning of one of said fricative, plosive, affricate classes 

being detected in the interval In which said spectral stability 
function exceeds a spectral stability threshold, their identifi- 
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cation being executed by analyzing, in a number of intervals (-N, 
preceding and following the interval of said beginning, the 
presence of silence class , the comparison of said total energy Ex 
with energy thresholds (Ei, &2)» an< * a subdivision into four 
voicing degrees of said intervals. 

Method according to claim 1, characterized in that said value of 
reference energy Erxf is given by the following formula: 

Brif « aCEx-Exi) 

where a is a constant, Ex is said total energy, E^x is a mean 
total energy extending over a number of initial intervals* 
Method according to claim 1 or 2, characterized in that a ratio K bet- 
ween low frequency and high frequency energy E^t % is calculated 
and then compared with a threshold, and said voicing degrees are: 

- voiced,- if said first indication is of a voiced sound and 
said ratio exceeds the threshold; 

- quasi-voiced, if the first indication is of unvoiced sound 

•i 

and said ratio exceeds the threshold; 

- quasi-unvoiced, if the first indication is of voiced sound 
and said ratio does not exceed the threshold; 

- unvoiced, if the first indication is of unvoiced sound and 
8 aid ratio does not exceed the threshold. 

Method according to claim 3, characterized in that a voiced frica- 
tive class is identified if most of said preceding intervals have 
been classified as silence, said total energy is less than a first 
^of said energy thresholds (E2) in the interval of beginning, and 
most of the subsequent Intervals have been assigned either a 
voiced or quasi-voiced degree, or quasi-voiced or quasi-unvoiced 
degree; or even if most of the preceding Intervals have been 
assigned a voiced or quasi-voiced degree, and most of the sub- 
sequent ones either a voiced or quasi-voiced degree, or quasi- 
voiced or quasi-unvoiced degree; said voiced fricative class 
lasting up to the interval in which total energy Ex exceeds said 
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first threshold* 

Method according to claim 3 or 4, characterised in that an unvoiced 
fricative class is Identified if most of the subsequent Intervals 
have been assigned an unvoiced or quasi-unvoiced degree, and If 
most of the preceding Intervals have been assigned a voiced or 
quasi-voiced degree or the silence class, and in most of the sub- 
sequent intervals the total energy Ex is less than a second of 
said energy thresholds (El), said unvoiced fricative class lasting 
up to the interval which is assigned the voiced or quasi voiced 
degree. 

Method according to any of claims 3 to 5* characterized in that a plo- 
sive class is identified if most of the preceding intervals have been 
assigned a silence class, during the Interval of beginning the 
total energy is greater than the first energy threshold 0&2), and 
if most of the subsequent Intervals have been assigned voiced or 
quasi-voiced degree, said plosive class identifying the Interval 
of beginning alone, or, if quasi-voiced or quasi-unvoiced degree 
has been allotted, said subsequent intervals. 
Method according to any of claims 3 to 6, characterised in that an af- 
fricate class is identified if most of the preceding intervals have been 

assigned the silence class, the subsequent Intervals unvoiced or 
quasi-unvoiced degree, and in most of the subsequent intervals 
total energy is greater than said second energy threshold (Ei), 
said affricate class lasting until the interval which Is reallot 
ted the voiced or quasi-voiced degree* 

Device for implementing the method of any claim from 1 to 7 , 
comprising an analog-to-digital converter (AD) of the speech 
signal subdivided into Intervals , a circuit (LPC ) for computing 
linear prediction coefficients (a*, p^) relevant to digital 
samples of each interval, followed by circuits for computing said 
normalized residual energy Br, said formant frequencies Fr, said 
autocorrelation function of the residual signal R x and said first 
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indication of voiced or unvoiced sound, a circuit (FFT) for calcu- 
lating the Fast Fourier Transform of digital samples of each 
interval* followed by a circuit (CEN) for computing said total, 
low-frequency, intermediate-frequency and high-frequency energies 
(Et» e L» ^M* e h)» a circuit (SSF) for computing said spectral sta- 
bility function, emitting a signal (13) indicating whether said 
spectral stability threshold has been exceeded, a circuit (SMOT) 
for determining the values of a smoothed intermediate-frequency 
energy function E'jj, and of its minrttna and minima, and a pro- 
cessing circuit (ELB) for said end phase, characterized in that it 
further comprises: 

• a circuit (SIL) for detecting said silence class, which 
receives the values of residual energy E& and total energy m 
Ex> computes said reference energy Butt? and compares it with 
said residual energy E R ; 

- a circuit (VNV) determining said subdivision into four 
voicing degrees, on the basis of said first indication of 
voiced or unvoiced sound, and of the computation of said 
ratio between the energies at low and high frequencies E^» 
Eh; 

- a circuit (COM?) for the comparison of said total energy Ex 
with said first and second energy thresholds (Ex, E2); 

- a first register (RG1) for combining data words, one per each 
interval, composed of said silence class, said formant fre- 
quencies Fr, said voicing degrees, said signal outgoing from 
said circuit (COMP) for the total energy comparison, the 
signal indicating whether the spectral energy threhsold has 
been exceeded, the values of the smoothed intermediate- 
frequency energy function E'm and maximum and minimum indica- 
tions; 

- a memory (MEMl) for the temporary storage of the last M data, 
words; 
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a circuit (CLSS) for determining said phonetic classes upon 
the analysis of said data words It receives via a data bus 
(19) form said memory (MEMl), which It supplies with the 
addresses on an address bus (20), said classes being supplied 
to said processing circuit (KLB) for the end phase, with a 
signal for the interval identification. 
Device as in claim 8, characterized in that said circuit (CLSS) 
for determining the phonetic classes essentially comprises: 

a first unit (IND1) for the sequential addressing of said 
memory (MEMl) during the reading, said first unit being stop 
ped by a voice signal (Q), and restarted from the value pre- 
sent on the address bus (20) of said memory by a first reco- 
very signal, and supplying, when active,, a sequence of pulses 
of interval identification; 

a second unit (IND2) for the sequential adressing of said 
memory (MEMl) during reading, said second unit carrying out 
at each activation a first and a second addressing and 
disabling a third unit (IND3), the first addressing beginning 
from N positions before the address present at the activation 
on said address bus (20) and addressing 2N+1 subsequent posi- 
tions, the second addressing starting from said address pre- 
sent at the activation on the address bus (20) and ending 
when said second unit receives a second inhibiting signal 
(32); 

said third unit (1ND3) addressing during reading said memory 
(MEMl) starting, at each activation, from the address present 
on said address bus (20) , disabling said second unit (IND2) 
and being stopped by a third inhibiting signal (35); 
a second register (RQ), which temporarily stores and supplies 
to the output a field of said data words, read in said memory 
. (MEMl), carrying the silence class (Q) or said voice signal 
(Q), said silence class being supplied to said processing 
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circuit (ELB) for the end phase; 

a third register (RSF) which, when activated by said voice 
signal (Q), temporarily stores a field of said data words, 
read In the memory (MEM1), carrying said signal Indicating 
whether said spectral stability threshold has been exceeded, 
and supplies to the output the activation signal (SF, SF), 
for said second (IND2) or said third (IHD3) unit depending on 
whether said field indicates threshold exceeded or not 
exceeded threshold; 

a circuit (VNS) for determining nasal, semivowel, vowel 
classes and the presence of diph tongs on the basis of the 
analysis of the fields of said data words present on the data 
bus (19) carrying said formant frequencies Fr and the values 
of the smoothed intermediate-frequency energy function E v m 
and the indications of maxima and minima, said circuit (VNS) 
being activated when said third unit (IND3) is activated and 
supplying it with the control signals relevant to the memory 
(MEHl) addressing, also supplying a sequence of pulses iden- 
tifying intervals, and being disabled when in a data word 
there is present again said silence class or the Indication 
of spectral stability threshold exceeded, thus supplying said 
third inhibiting signal (35); said nasal, send vowel, vowel 
classes and dlphtong presence being supplied to said pro- 
cessing circuit (ELB) for the end phase; 

a first logic circuit (FPA) which generates condition signals 
(37) for the identification of fricative, plosive, affricate 
classes, on the basis of the analysis of the fields of said 
data words present on the data bus (19), carrying the silence 
class, voicing degrees, the comparisons of the total energy 
with said first and second threshold (E2, B t ); said first 
logic circuit being activated by said first addressing 
supplied by the second unit (IND2); 
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1 a combinatory logic (LGC) of said condition signals (37), 

emitting an identification signal (38) of voiced-fricative or 
unvoiced fricative, or plosive or affricate classes; 

- a second logic circuit (DUE) for determining the number of 
5 intervale labelled by fricative, plosive, affricate classes, 

which receives said identification signal (38) from said com- 
binatory logic, the fields of said data words carrying 
voicing degrees and the comparisons of the total energy with 
the thresholds, which is activated by said second addressing 

10 supplied by said second unit (IND2), which supplies one of 

said fricative, plosive, affricate classes to said processing 
circuit (ELB) for the end phase, and also supplies a sequence 
of pulses identifying the intervals, and, at the end of the 
sequence, said second inhibiting signal (32) and said third 

15 Inhibiting signal (35) being also said first recovery 

signal; 

a logic gate (P6) supplying said signal (40) for interval 
Identification to the processing circuit (ELB) for the end 
phase on the basis of the sequences of pulses identifying the 

20 intervals it receives from the first unit (IND1), from the 

circuit (VNS) for the determination of nasal, semivowel and 
vowel classes, and from said second logic circuit (DUR). 
10. Device as in claim 9, characterized in that said first .logic cir- 
cuit (FFA) basically comprises: 

25 - a first counter (CA) of the number of preceding intervals in 

which the silence class is present; 

- a second counter (CB) of the number of preceding intervals 
having a voiced or quasi—voiced degree; 

- a third counter (CC) of the number of subsequent Intervals 
30 having an unvoiced or quasi-unvoiced degree; 

- a fourth counter (CD) of the number of subsequent intervals 
having a voiced or quasi-voiced degree; 
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- a fifth counter (CE) of the number of subsequent intervals 
having a quasi-voiced or quasi-unvoiced degree; 

- a sixth and a seventh counters (CF, CG) of the number of sub- 
sequent intervals in which total energy does not exceed or 
exceeds respectively said second threshold (&].); 

said first to seventh counters supplying an active logic 
level when exceeding half the maximum counting value; 

- an eigth and ninth counters (CH, CI) which supply an active 
logic level if in said initial interval the total energy Ex 
is respectively less or greater than said first threshold 

<e 2 ); 

a circuit (DIN) allotting the pulses of said first addressing 
to said counters as synchronism signals; 

- a fourth register (R63) which receives the outputs of said 
counters and supplies them on the output .as condition signals 
(37) at the end of said first addressing* 

11* Device as in claim 10, characterised in that said combinatory 
logic (LGC) emits: 

- a signal of voiced-fricative class (Fvl) if the outputs are 
active of the following counters: first and fourth and eight, 
or first and fifth and eight* or second and fourth or yet 
second and fifth; 

a signal of unvoiced-fricative class (Fnl) if the outputs are 
active of the following counters: first and third and sixth, 
or second and third; 

- a first signal of plosive class (Pi) if the outputs of said 
first» fourth and ninth counters are active; 

- a second signal of plosive class (P2> If the outputs of said 
first, fifth and eight counters are active; 

- a signal of affricate class (Ax) if the outputs of said 
first, third and seventh counters are active* 

12* Device as in claim 11, characterized in that said second logic 
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circuit (DDR) essentially comprises: 

a fifth register (RG4) which emits said voiced— fricative 
class (F v ) if it receives said signal of voiced-fricative 
class (F v i) and as long as total energy is less than said 
first threshold (E 2 >; 

a sixth register (RG5) which emits said unvoiced-fricative 
class (F n ) if it receives said signal of unvoiced-fricative 
class (F n i) and as long as the unvoiced or quasi-unvoiced 
degree is present; 

a seventh register (RG6) Which emits said affricate class (A) 
if it received said affricate class signal (Ai) and as long 
as unvoiced or quasi-unvoiced degree is present; 
an eighth register (RG7) which emits an active logic level if 
it receives said second signal of plosive class (P2) and as 
long as the quasi-voiced or quasi-unvoiced degree is present; 
said fifth, sixth, seventh, eighth registers being synchro- 
nized by the pulses of said second addressing; 
a logic gate (P15) which emits said plosive class CP) If the 
. output of said eighth register (BG7) is active or if it 
receives said first signal of plosive class (Pi); 
a ninth register (RG9) which supplies the pulses of said 
second addressing (39) as a sequence indentifylng the inter- 
vals if and as long as one of said fricative, plosive, afiri- 
cate classes is present; 

a tenth register (RG8) which emits said second inhibiting 
signal (32) when said fifth, sixth, seventh, eighth registers 
are disabled* 
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